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Abstract 



, For the solution of full-rank ill-posed linear systems a new approach based on the Arnoldi 

algorithm is presented. Working with regularized systems, the method theoretically recon- 
structs the true solution by means of the computation of a suitable function of matrix. In 
this sense the method can be referred to as an iterative refinement process. Numerical ex- 
periments arising from integral equations and interpolation theory are presented. Finally, 
the method is extended to work in connection with the standard Tikhonov regularization 
with a right hand side contaminated by noise. 
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§ '■ 1 Introduction 
o 

In this paper we consider the solution of ill-conditioned linear systems 

: ^ = b. (i) 

h ; 

We mainly focus the attention on linear systems in which A £ M 7VxAr is full rank with singular 
values that gradually decay to 0, as for instance in the case of the discretized Fredholm inte- 
gral equations of the first kind. In order face this kind of problems one typically apply some 
regularization technique such as the well known Tikhonov regularization (see e.g. [17] for a wide 
background). The Tikhonov regularized system takes the form 

(A T A + XH T H)x x = A T b, (2) 

where A £ M. is a suitable parameter and H is the regularization matrix. The system ([2]) should 
have singular values bounded away from in order to reduce the condition number and, at the 
same time, its solution x\ should be closed to the solution of the original system. 
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For this kind of problem the method initially presented in this paper is based on the shift and 
invert transformation 

Z = (A + XI)-\ (3) 

where A > is a suitable parameter and I is the identity matrix. Provided that A is large enough, 
if A is positive definite (F(A) C C + , where F(A) denotes the field of values) the shift A + XI, that 
represents the most elementary example of regularization, has the immediate effect of moving the 
spectrum (that we denote by cr(A)) away from so reducing the condition number. Moreover, 
since 

x = A~ l b = f(Z)b, 
where ^ 

f(z) = (- z ~A = (1 - A*)" 1 *, (4) 

the idea is to solve the system Ax = b by computing f(Z)b. For the computation of f(Z)b, we use 
the standard Arnoldi method projecting the matrix Z onto the Krylov subspaces generated by 
Z and b, that is K m (Z, b) = span{6, Zb, Z m ~ 1 b}. By definition of Z the method is commonly 
referred to as the Restricted-Denominator (RD) rational Arnoldi method [TT] . |25j. 
Historically, a first attempt to reconstruct the solution from x\ that solves 

(A + XI)x x = b, (5) 

was proposed by Riley in [28]. The algorithm is just based on the approximation of f(Z) by 
means of its Taylor series. Indeed we have 



1 oo 

A-'b = - £(AZ)*&, (6) 

that leads to the recursion 



A 

k=l 



Xk+i = y + XZx k , x = 0, y = Zb. (7) 

It is easy to see that the method is equivalent to the iterative improvement 

{A + XI)e k = b-Ax k 
x k+ \ = x k + e k 

generally referred to as iterated Tikhonov regularization or preconditioned Landweber iteration 
(see e.g. [H], [19], [21], [22], [26]). The main problem concerning this kind of algorithms is that 
they can be extremely slow because the spectrum of Z accumulates at 1/A (cf. ([3]), (jSJ)). This, 
of course, large values of A, that is, when A + XI is well conditioned. ^From the point of view of 
the computation of function of matrices this is a well known problem, i.e., the the computation 
by means of the Taylor series generally provides poor results unless the spectrum of the matrix 
is close to the expansion point. Indeed, from well known results of complex approximation, the 
rate of convergence of a polynomial method for the computation of a function of matrix depends 
on the position of the singularity of the function, with respect to the location of the spectrum of 
the matrix. 

We also point out that, in [5], the authors construct an improved approximation via extrap- 
olation with respect to the regularization parameter, using the singular values representation of 
the solution. Extrapolation techniques can also be applied to accelerate ([7j), as suggested in [5] 
and also indicated by Fasshauer in [12]. 



For problems in which the right hand side is affected by noise, instead of working with the 
transformation ([3]) or implicitly with systems of type (jSJ), we shall work with the standard regu- 
larization (j2j) and hence on the transformation 

Z = {A T A + XL T L)-\ 

As we shall see, the subsequent Arnoldi-based algorithm for the reconstruction of the exact 
solution will be almost identical to the one based on Q, but the use of a regularization matrix 
L different from the identity allows to define methods less sensitive to perturbations on the right 
hand side. 

The paper is organized as follows. In Section [2j we describe the Arnoldi method for the 
computation of f{Z)b and, in Section [3j we present a theoretical a-priori error analysis. In 
Section HJ we show an a-posteriori representation of the error. In Section [5j we analyze the 
choice of the parameter A. Some numerical experiments taken out from Hansen's Matlab toolbox 
on regularization [T6tll8j. and from the theory of interpolation with radial basis functions are 
presented in Section [61 Finally, in Section [7J we extend our method to the Tikhonov regularization 
in its general form (j2J) showing also some tests with data affected by noise. 



2 The Arnoldi method for f(Z)b. 

For the construction of the subspaces K m (Z, b), the Arnoldi algorithm generates an orthonormal 
sequence {vj}j >0 , with v\ = bj ||6||, such that K m (Z, b) = span {t>i, t> 2 , v m } (here and below the 
norm used is always the Euclidean norm). For every m we have 



T 
m • 



where V m — [vi, V2, v m ), H m is an upper Hessenberg matrix with entries hij = vfZvj and ej is 
the j-th vector of the canonical basis of R m . Formula (jHJ) is just the matrix formulation of the 
algorithm. 

The m-th Arnoldi approximation to x — f{Z)b is defined as 

x m = \\b\\ V m f(H m )ei. 

Regarding the computation f(H m ), since the method is expected to produce a good approximation 
of the solution in a relatively small number of iterations, that is for m <^ N, one typically considers 
a certain rational approximation to /, or the Schur-Parlett algorithm (see e.g. [T51 Chapter 11] 
or [20])- 

Denoting by II m _i the vector space of polynomials of degree at most m — 1, it can be seen 
that 

x m =p m -i(Z)b, (9) 

where p m _ x G II m _i interpolates, in the Hermite sense, the function / at the eigenvalues of 
H m 129]. 

As already mentioned, this kind of approach is commonly referred to as the RD rational 
Arnoldi method since it is based on the use of single pole rational forms of the type 

R m -i(x) = , 9m ~ 1 |^ i ) aeR, Qm — i e n m _i, m > 1, 
(x + a) m 1 



introduced and studied by N0rsett in [27] for the approximation of the exponential function. In 
other words, with respect to A, formula (Q is actually a rational approximation. 



It is worth noting that, at each step of the Arnoldi algorithm, we have to compute the vectors 
Wj = Zvj, j > 1, which leads to solve the systems 

(A + XI)wj =Vj, j > 1. 

Since V\ = bj \\b\\, the corresponding wi is just the scaled solution of a regularized system (with 
the rough regularization A — > A + XI). In this sense if A arises from the standard techniques that 
seek for the optimal regularization parameter X opt (L-curve, Generalized Cross Validation, etc.) 
this procedure can be employed as a tool to improve the quality of the approximation 
Anyway we shall see that, using the Arnoldi algorithm, larger values for A are more reliable. 



3 Error analysis 

The error E m := be expressed and bounded in many ways (see e.g. the recent paper [1] 

and the references therein). In any case, however, the sharpness of the bound essentially depends 
on the amount of information about the location of the field of values of Z, defined by 



The bound we propose is based on the use of Faber polynomials. We need some definitions and 
we refer to [30] or [31] for a wide background of what follows. 

Let Q be a compact and connected set of the complex plane. By the Riemann mapping 
theorem there exists a conformal surjection 

V> : C\{w : \w\ < 1} -»C\fi, ^(oo) = oo, ^'(oo) = 7, (10) 

that has a Laurent expansion of the type 

c l c 2 

ip(w) = 7W + c H 1 H 

w ur 

The constant 7 is the capacity of Q. If Q is an ellipse or a line segment then q = for % > 2. 
Given a function g analytic in Q, it is known that defining p m -i as the truncated Faber series 
of exact degree m — 1 with respect to g and ip, then p m -i provides an asymptotically optimal 
uniform approximation to g in Q, that is 

lim sup ||p m _i - g\\K m = lim sup \\p* m -i - g\\l! m , (H) 

m— too m— too " " ai 

{Pm-i ( z )} m >i De i n g t ne sequence of polynomials of best uniform approximation to g in f2. Prop- 
erty (fTTl) is also called maximal convergence. Let moreover <ft : C\Q — > C\{w : \w\ < 1} be the 
inverse of tp. For any r > 1, let r r be the equipotential curve 

T r :={z:\<P(z)\=r}, 

and let us denote by fl r the bounded domain with boundary T r . Let r > 1 be the largest number 
such that g is analytic in Q r for each 7 < r < r and has a singularity on Tf. Then, it is known 
that the rate of convergence of the sequence {p m -i ( z )} m >i is given by 

lim sup ||p m _i - g\\H m = x- (12) 

m—>oo r 

For this reason we know that superlinear convergence is only attainable for entire functions, where 
asymptotically one can set r := m. In order to derive error bounds for the computation of f(Z)b 
we need the following classical result 



Theorem 1 |72J/ Let Q be a compact and convex subset such that g is analytic in Q. For 
1 < r <r the following bound holds 

1 

\Pm-i -g\\ a < 2 Mlrv p ^ 

r 

Using the above theorem, for our function f(z) = z/(l — Xz), singular at 1/A, we can state 

the 

Proposition 2 Assume that Q is an ellipse of the complex plane, symmetric with respect to the 
real axis with associated conformal mapping if>{w) = •yw + Co + ci/w. Assume that if>(l) < 1/A 
and let r be such that if)(r) = 1/A. Let moreover m be the smallest integer such that 

r _ 

< r - 1. 



m + 1 

Then for m>m 



XII AC lit I -L lib T i- ft a\ 

\\Vm-l ~ J\\q < — i \ i \2„/,/^ cm ' ( J 



2emr 1 m + 1 

m(r — 1) — 1 X 2 if>'(r) r 
and for m <m 

4 / 2 \ m f+l 

lbm-i - f\\ n < A 2 _ 1) ^/(f) ^f^T J FTi- (15) 

Proof. Let r = 9 — e, with < e < f"— 1. By the properties of Q, we have 

if){r) 



lr '' l-A^(r)' 
and, by direct computation 

(r — e)r 

Hence using if) (r ) = 1/ A we find 

■0(r) 



r r < 



1 - A ip{?) - 7£ + 



gig 



r — £)r 



X 2 e 7 



Cl 



r — £ r 



1 

< 



X 2 Elfj>(?) ' 



By (|T3l) . we thus obtain 



WPm-i - /lln < x^ir-er^]^ - (16) 

r — £ 



Now setting 



since this value minimizes 



£ (r — e) 



let m be the smallest positive integer such that 

= < r - l. 

m + l 

By inserting (ITTl) into ( JT6l) and using 

1 mr 

< 



1 m(r — 1) — 1 
r — e 

we find ffT4"j) . For m < m we can take for instance 

r — 1 
£ = — - — • 



Substituting ([T8J) into ([T6J) we obtain (03} . ■ 

Remark 3 iVoie t/iat t/ie assumption < 1/A m Proposition^ just means that the ellipse is 
strictly on the left of the singularity of f . 

Regarding the field of values of Z, F(Z), it is well known that it is convex, that c(Z) C F(Z), 
and that F(H m ) C F{Z) (where H m is defined in Section [2}. Of course if C C+ (A is 

positive definite) then F(Z) C {z E C : < Re(z) < 1/A} and the corresponding / is analytic in 
F(Z). Using these properties we can state the following result 

Theorem 4 Assume that F(A) C C + . Let Q be an ellipse (with associated conformal mapping 
ip, and inverse 4>) symmetric with respect to the real axis and such that F(Z) C Q with f analytic 
in Q. Then, form large enough, we have 

\\E m \\<4ecJ-4=K??±±, 
r — 1 ip'if) r" 1 

where K = 1/A 2 , r — 0(1/A), and C = 11.08 (C = 1 if A is symmetric). 

Proof. Using the properties of the Arnoldi algorithm, we know that for every p m -i £ n m _i, 

(H m )ei = p m -i(Z)b. (19) 

Hence, from ( fl9l) . it follows that, for m > 1 and for every p m -i G n m _i, 

E m = x - x m = f{Z)b - p m _i(Z)6 - V m {f{H m ) - p m _i(i/m))ei. (20) 

Since ||T^n|| = 1 we have (see [9]) 

ll-^mll < 2C ||p m _i - f\\ F{z) ■ (21) 

Therefore taking p m -\ as the (m — l)-th truncated Faber (Chebyshev) series, the result follows 
from Proposition [2] since F(Z) Cfi. | 



Remark 5 By &ty) , if both Z and H m are diagonalizable then C in l[21\) is a constant depend- 
ing on the condition number of the diagonalization matrices and Q can be taken as an ellipse 
containing o~{A). 

Theorem H] is surely important from a theoretical point of view since it states that the Arnoldi 
algorithm produces asymptotically optimal approximations. However, if we consider for simplicity 
the symmetric case, we can also understand that it cannot be used to suggest the choice of A. 

Indeed, let Ai > and A a? be respectively the smallest and the largest eigenvalues A. Then 
F(A) = [Ai, X N ] and 

1 1 



F{Z) 

In this case, by f [2Tj) we have 



X N + A' Ai + A 



=: h 



\\E m \\ < 2 max | /(z) - p m -i{z)\ . 

As already mentioned, the conformal mapping ip associated to I\ takes the form 

ip{w) = 7W + c + — (22) 



w 



where 

7 



1/1 1 \ 1 \ N - Ai 



4\Ai + A Aiv + Ay 4 (Ai + A) (X N + A)' 



1 / 1 1 \ = 1 A^ + Ax + 2A 

°° 2 VA! + A + A.v + Ay' 2 (X 1 + A) (A^ + A) ' 1 ' 

ci = 7. 

For r > 1, fl r is the confocal ellipse (foci in and -) described by ib(re %e ), < 9 < 2n. 

v \ N + A Ai + A ; 

Since f(z) is singular at 1/A, f is the solution (> 1) of 

7 1 

ir + c + ^ = - (24) 
r A 

that is 

r = u + y/u 2 -l, (25) 

where 

u _ 2\i\ N ^ X N + Ai 
A(Atv — Ai) Atv — Ai 

Thus, r monotonically decreases with respect to A and r — > 00 for A — > 0. 

The above arguments simply show that the error analysis does not take into account of the 
computational problems in the inversion of A + XI for A ~ 0. The method is very fast for 
A ~ because, at each step, we are inverting something very close to the original operator A. In 
order to derive a more useful estimate one should modify the above analysis imposing in some 
way the requirement A 3> Ai. In some sense this will be done in Section |5] where we consider 
the conditioning in the computation of f(Z)b that is obviously closely related to the rate of 
convergence of any iterative method. 



4 A-posteriori error representation 

By a result on Pade-type approximation proved in [3], we know that the Hermite interpolation 
polynomial of the function 

9(s) = - J— 
1 — st 

at the zeros of any polynomial u m of exact degree m in s is given by 

i2m-l(s) = 1 



1 - st \ v m (t l ) 
Setting A = t~ l , we have that 

/(0 = = -^g (r 1 ) , 

and so 

-^-^T^O-w) (27) 

interpolates /(£)• By Q let p m _ x G IT™.! be the polynomial that interpolates, in the Hermite 
sense, the function f(z) at the eigenvalues of H m , £i,...,£ m ', m! < m, with multiplicity fcj, 
i = 1, m'.Then 

= -A-^-iCC 1 ) = l<i<m\0<j<h-l. 

By ff27|) and using the above relation is it easy to see that u m (s) = det(sJ — H^). In this way, 
by direct computation, 

X m = P m -i(Z)b, 



V v m (\) J 



(21 



Since, of course, A 1 and Z 1 commute, we find 



\ x m ~A\ < \\v m {A + 



\\x\\ \ u m(X)\ 
A posteriori error estimate can be derived in this way. Since 

u m (s) = det(sJ — H^), 

s m det(H m - s^I) 
detH m 

defining q m (£) = det(H m — £1), we have 



INI " A m |g m (A- 1 )| 
It is worth noting that, using the relation 

q m (Z)b = (IJ v m+ i 

(see [25]), we obtain from ( |28|) 



x m - x\\ < \\(A + XI) m q m {Z)\\ ^ (2g) 



- X H = — Tvr^r \\ A ^ A + A/) m *wi|| , 

A m |g m (A x )| 

which proves the convergence in a finite number m* < AT of steps of the method in exact arith- 
metics. Note that by (1281) the corresponding v m * is the minimal polynomial of A + XI for the 
vector 6. 



5 The choice of A 



As already mentioned, the arguments of Section [3] reveal that the standalone error analysis of 
the computation of f(Z)b is not reliable to suggest the choice of A, since k(Z) — > k(A) as A — > 
(/«(•) denoting the standard condition number of a matrix). In other words, it does not take into 
account that, at each step, we need to solve a system with the matrix A + XI. At the same time, 
focusing the attention on the accuracy (so neglecting the rate of convergence) one could expect 
that "large" values of A should allow an improvement of it, since the linear systems with A + XI 
would be solved more accurately. The numerical experiments show that this is not true, as shown 
in Fig. [TJ where we consider the problem BAART, taken out from the Hansen's Matlab toolbox 
Regtools (see [16] and [IS]). 




2 4 6 8 10 12 14 16 18 

number of iterations 



Figure 1: BAART(40) - Minimum attained error with respect to the number of iterations for 
different values of A. 

Indeed the diagram of Fig. [1] represents the standard situation, that is, increasing A, we have 
a loss of accuracy. The behavior on the leftmost part of the diagram is clear since it is due to the 
conditioning of Z for A small. On the rightmost part we have again a loss of accuracy but now it 
depends on the numerical instability in the computation of f{Z) for A large (the problem can be 
easily observed even working scalarly). This observation leads us to consider the conditioning in 
the computation of f(Z)b for having a good strategy to define A. 

The absolute and the relative condition number for the computation of g{X) where g is a 
given function and X a square matrix are given by (cf. [2D] Chapter 3) 

£ ^°l!£||<£ 6 
II A II 

K r {g,X) = K a (g,X) , (31) 

llfl'WII 



and these definitions imply that 

\\g{X + E) - g{X)\\ < K a (g,X) \\E\\ + 0(||^|| 2 ). 
Proposition 6 For the function f(z) = (1 — Xz)~ 1 z we have the bound 

r.ff ZX 11 ^^ 11 ^ (32) 

Proof. In order to derive first the absolute condition number we have 

f(Z + E)-f(Z) = [(Z + E)- 1 - XI}- 1 - (Z' 1 - A/)- 1 , 

= [(/ + Z- l E)- l Z- 1 - XI] _1 - (Z- 1 - XI)- 1 , 
= [Z- 1 -XI + K^Z.E)]' 1 -(Z- 1 -XI)- 1 , 

where 

oo 

A{Z,E) := ^{-ifiZ^EfZ- 1 . 

k=l 

Hence 

f(Z + E)-f(Z) = [l + {Z- 1 -XI)- 1 X{Z,E)Y l {Z- 1 -XI)- 1 -{Z- 1 -XI)-\ 

= V °° (-lytZ- 1 - XI)- ] A(Z, E) j {Z- 1 - XI)- 1 - (Z- 1 - XI)- 1 , (33) 

and finally 

\\f(Z + E) - f(Z)\\ < \\iZ~ 1 - Xiy'Z-'EZ-^Z- 1 - A/)- 1 !! + 0(\\Ef), 

so that 

n a (f,Z) < ||(/-AZ)- 2 ||, 
that proves (1321) using (13T1) and the definition of f(z). Note that by (1331) 

L(Z, E) := (I - XZ)- 1 E{I - XZ)- 1 

is the Frechet derivative of / at Z applied to E. ■ 

This Proposition simply shows that the problem is well conditioned for A — > and ill condi- 
tioned for A ^> 0, that matches with the error analysis of Section [3j Of course the situation is 
opposite to what happens for the solution of the linear systems with A + XI during the Arnoldi 
process. Therefore the idea, confirmed by many numerical experiments, is to define A such that 
K r (f, Z) « k(A + XI), that is, to consider the bound (|32|) and solve the equation 

'lKz-"- ) Ajf'? =ll(j4 + AJ)l| ll(' 4 + AJ )" 1 ll- 

In the SPD case everything becomes clear since we have 

||(7- XZ)- 2 \\ \\Z\\ A + Ai 



\\(Z-i - xi)-i\\ 
MA + xmUA + xiy'W -. 



x N + x 



Ai + A 



that for Ai — > leads to 

a = v / a7a^ + o(a 1 ). 



Remark 7 If the underlying operator is bounded then one may consider the approximation 

\Ai^v « — — for Ai -»■ 0. 



Remark 8 In t/ie S'PD case, taking A* = V^iA/v one? putting it into l[2l)\) - l[26}) , we find that the 
asymptotic convergence factor of the method is given by 

1/m ^ 1 _ Af - A 1 / 4 _ - 1 



Remark 9 T7ie choice of X* has another interesting meaning. Indeed, let us consider the problem 
of the computation of g{A)b with g singular only at and A SPD. Using the transformation 
z = (a + A) 1 (cf. (EPj, if the corresponding g*{z) = gizT 1 — A) has a non-removable singularity 
at 0, then the optimal choice of A is given by solving the equation 

= k (34) 

(cf. ( dip and (flgj) ). that is, the midpoint of [0, 1/A] must be equal to the midpoint of I\, because 
in this way we have simultaneously ip(—r) = and ip(r) = 1/A. A straightforward computation 
shows that solving ( pOj leads exactly to A*. For instance, in f23tf the author uses the RD Arnoldi 
method to compute y/Ab and obtains the same result even if following a different approach. 

Remark 10 The condition number of A + X*I is given by 



Xn + vAiA/v / A 



In the nonsymmetric case, the analysis is a bit more difficult but many numerical experiments 
have shown that just having information on the conditioning of A, the choice A ~ k(A)^ 1 ^ 2 is 
generally satisfactory, that is, we are rather close to the minimum of a curve similar to the one 
of Fig. [TJ For very ill-conditioned problems we suggest to define A a bit larger, say in the range 
10k(t4)~ 1 / 2 -t- 100k(A)~ 1 ^ 2 , since the errors generated by the solution of the linear systems might 
be much larger than the machine precision. 



6 Numerical experiments 

In order to test the efficiency of our method, that from now on we denote by RA (Rational 
Arnoldi), we consider here some numerical experiments where we compare it with other classical 
iterative solvers. The RA method have have been implemented in Matlab following the line of 
Algorithm [T] described below. 

It is worth noting that we make use of the LU (or Cholesky) factorization to solve the linear 
system at each step. The reason is to reduce the computational cost since the factorization is 
computed only once at the beginning, taking also into account that A + XI should be relatively 
well conditioned. Anyway, for large scale non-sparse problems an iterative approach producing 
an inner-outer iteration should be considered. 

We consider four classical test problems taken out from Hansen's Matlab toolbox Regtools, 
GRAVITY, FOXGOOD, SHAW and BAART. These discrete linear problems arise from the 
discretization of Fredholm integral equations of the first kind. In all experiments, we consider a 



Algorithm 1 - RA Algorithm for solving Ax = b. 



i: Require A G R NxN , b G R N , A G R 
2: Define / = (1 - Az)- 1 ^ 

3: if (A + A/) is SPD, then Compute L s.t. (4 + XI) = L L T 
else Compute L, U s.t. (A + A/) = LU, end if 

4: Ul<-6/||&||,Vi<-[wi] 

5: for m = 1,2,... do 

5.1: Update i7 m G ]R mxm by Arnoldi's algorithm 

Remark: In the Arnoldi's algorithm, we compute w m = Zv m 

solving (A + \I)w m = v m , that is w m = U~ 1 L~ 1 v m or w m = (L T ) _1 L -1 w, 

5.2: Compute f(H m ) by Schur-Parlett algorithm 

5.3: x m <- \\b\\V m f(H m ) ei 

5.4: Output x m , approximation of f(Z)b = A~ 1 b 
5.5: Update Vm + i = [i>i, . . . ,v m+ i] G M Arx ( m+1 ) orthonormal basis for 
K m+ i(Z,b), by Arnoldi's algorithm 
end for 



noise-free right hand side, that is, we define b = Ax. The numerical results have been obtained 
with Matlab 7.9, on a single processor computer Intel Core2 Duo T5800. 

Tables [1] and [2] below summarize the results. For comparison, we consider the codes ART, 
CGLS, LSQR_B and MR2 taken out from Hansen's toolbox, CG, GMRES and MINRES that are 
resident Matlab functions, and Riley's method. The number between parentheses beside the name 
of the test is the dimension of the system. In all tests Xra and Xmey denote the chosen values 
of the parameters for the RA and Riley's method respectively. Since no general indication about 
the choice of the parameter for Riley's method is available in the literature, in all experiments we 
heuristically select a nearly best one. In the tables we consider the minimum attained error norm 
err, the corresponding residual res and the number of iterations nit. Each method was stopped 
when the number of iterations reaches the dimension of the system. The missing numbers are 
due to the structure of the coefficient matrix (symmetric, SPD, and so on). 
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Table I: Results for GRAVITY and FOXGOOD. 





SHAW(64) 


BAART(120) 


— r ? 

■^RAj A Riley 


le-9, le-10 


le-8, le-10 




err 


res 


nit 


err 


res 


nit 




o.oe-o 


z.Ue-7 


7 


o.oe-o 


l.oe-o 


D 


VjrlVlrtll/O 








9.6e-6 


1.4e-15 


15 


ART 


7.7e-l 


6.8e-2 


64 


3.4e-l 


2.7e-2 


120 


CGLS 


2.8e-2 


5. le-10 


64 


2.4e-2 


1.7e-14 


120 


LSQR_B 


2.8e-2 


1.5e-10 


62 


2.4e-2 


2.4e-15 


120 


MR2 


1.6e-l 


3.7e-6 


15 








MINRES 


1.0e-2 


1.2e-ll 


64 








RILEY 


9.6e-3 


8.0e-10 


2 


1.3e-5 


1.3e-10 


2 



Table 2: Results for SHAW and BAART. 

The results of Tables [T] and 12] are of course encouraging, especially considering the accuracy 
with respect to the number of iterations. Indeed, both RA and Riley's method require a linear 
system to solve at each step, and so it is fundamental to keep the number of iterations low. 
However, it is worth pointing out that, in the experiments, such linear systems are solved with 
the LU or Cholesky factorization, so that most part of the computational cost is due to the first 
iteration. 

A classical drawback of many iterative solvers for ill-conditioned problems is the so-called 
semi- convergence (see e.g. [2j), that is the iterations initially approach the exact solution but 
quite rapidly diverges. This phenomenon is very common in particular for iterative refinement 
methods (thus for Riley's and RA) where there is a heavy propagation of errors. Of course, 
unless a sharp error estimator is available, this undesired behavior can be quite dangerous for 
applications. In order to understand what we can do to face this problem, in Fig. [2] we consider 
the error behavior of the RA method for BAART changing the value of the parameter. 

Looking at Fig. [21 we can observe that increasing A the procedure becomes absolutely stable, 
even if we have to pay a small price in terms of accuracy. Therefore, for applications in which 
it is not possible to monitor in some way the accuracy step by step, the semi-convergence can 
be prevented taking n(A)~ l l 2 <C A < /t(v4) -1//4 , thus looking for a compromise between accuracy 
and stability. On the other side, reducing A, the method is really fast but also highly unstable. 
This last consideration is particularly true for Riley's method, where, at least for these kind of 
problems, one always observes a rapid divergence after a couple of iterations, also for relatively 
large values of A. 

In this Section, we also look at another classical example coming out from approximation 
theory. We consider in particular the reconstruction of the Franke's bivariate test function via 
interpolation by means of Gaussian Radial Basis Functions (RBF) with shape coefficients equal 
to 1 (see e.g. (13] for a background). For simplicity, instead of scattered points, we consider here 
the very special case of a grid of 15 x 15 equally spaced points on the square [0, 1] x [0, 1] that leads 
to a SPD linear systems of dimension 225 whose condition number is about 10 21 . In Fig. El the 
surfaces obtained with the Cholesky factorization, the CG and the RA method (with A = 10~ n ) 
are plotted. Since the exact solution of the system is unknown, we used the residual as a stopping 
criterion, so that the CG result corresponds to the iteration 190 (residual ~ 1.6e — 1), while the 
RA result corresponds to the iteration 10 (residual ~ 1.4e — 1). 

While the result with the Cholesky factorization was expected (a similar test have been pre- 
sented in [12]), the difficulties with Krylov methods were not. Indeed, the CG method has shown 
to be the best Krylov method for this problem, but the results are poor if compared with those 
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Figure 2: BAART(120) - Error behavior for A = lO" 4 , 1(T 6 , 1(T 8 , 10" 



of the RA method. We have to point out that, for this case, the reconstruction given by the RA 
and the Riley's method are very similar. 

7 Extension to Tikhonov regularization 

In many applications it is often necessary to deal with ill-conditioned linear systems in which the 
right hand side is affected by noise. Defining e& as a perturbation (of course unknown) of the 
right hand side b, one is forced to solve in some way 



hoping that the computed solution of fl35l) is close to the solution of Ax = b. In this situation, 
the RA method does not seem to be so powerful and robust as in the noise-free case. Moreover, 
unless the noise level is very low, it is also difficult to design a strategy to define the parameter 
A. Indeed, in order to adopt the theory of Section [5] based on the analysis of the conditioning, 
we should need, for instance, to construct an invertible linear filter F such that Feb ~ 0. In this 
way F~ x Ax ~ b, and hence information on the choice of A can be obtained considering k{F~ 1 A). 
Anyway this kind of approach is beyond the purpose of this paper, and we prefer to extend the 
idea of the RA method in order to make it able to work directly with Tikhonov regularization in 
its standard form. 

As well known Tikhonov regularization is based on the solution of the minimization problem 



where the matrix H is generally taken as an high-pass filter (e.g. the second derivative) so that 

1 1 2 m 

the term \\Hx\\ plays the role of the penalization term in a constrained minimization. The main 



Ax = b, b := b + e^, 



(35) 




A > 0, 



(36) 



Franke Cholesky 




Figure 3: Interpolation of Franke's bivariate test function by means of Gaussian RBF. 

problem is that the noise generally involves also frequencies of the exact solution so that it is not 
possible to solve ( 1361) letting A — > oo as in standard constrained minimization. Anyway, defining 
suitably A (see [TF] for a background), the corresponding solution x\ is expected to be somehow 
similar to the desired noise-free solution. The problem ( 136]) leads to the solution of the regularized 
system 

(A T A + XH T H)x x = A% (37) 

where the matrix A T A + \H T H is also expected to be better conditioned than A. 
Following the idea of the RA method, we consider here the transformation 

Z = {A T A + XH T H)-\ 

Since the exact solution can be written as x = (A T A) 1 A T b, we have 

x = (Z- 1 -\H T HY l A T b, 
= f(Q)(H T H)- 1 A T b, 

where 

Q = Z (H T H) = ((H T H)~ l A T A + \I^j \ 

Note that we are assuming to work with the exact right hand side even if, in practice, the method 
is applied with b. 

Hence we can compute the solution working with the Arnoldi algorithm based on the con- 
struction of the Krylov subspaces K m (Q, (H T H) A T b). Thus, starting from v\ = vj ||t> ||, where 
v is the solution of 

(H T H) v = A T b, (38) 



we need to compute, at each step of the algorithm, the vectors Wj 
to solve systems of the type 



Qvj, j 



> 1, that is, we need 



(A T A + XH T H)w 3 = (H T H) 



3 ' 



Note that by (|38|) and the arising definition of v%, the first step of the Arnoldi algorithm yields 
the Tihhonov regularized solution x\ (cf. (|37|) ). Hence, also in this case, the procedure can be 
interpreted as an iterated Tikhonov regularization. 

In order to appreciate the potential of this extension (that we indicate by RAT, Rational- 
Arnoldi-Tikhonov) we consider the test problem SHAW and BAART with a right hand side 
contaminated by an error e b defined by 



S 



e b 



u. 



where 5 is the relative noise level, and u is a vector containing random values drawn from a normal 
distribution with mean and standard deviation 1. In the experiments, we define 5 = 10~ 3 , and, 
as suggested in [8], we take as regularization matrix 



/ 2 -1 
-1 2 -1 



H 



\ 



V 



-1 2 -1 
-1 2 



dTVxTV 



Indeed, at least for these experiments, this choice produces better results than the classical 
(N — 2) x N matrix representing the second derivative operator. Since the noise is randomly 
generated, for both examples we consider two tests, and we compare the RAT method (with 
different values of the parameter A) with GMRES, ART, LSQR_B and MR2. The results are 
collected in Table EJ 
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Table 3: Minimum attained error and corresponding iteration number for SHAW and BAART 
with Gaussian noise of level 5 = 10 -3 



Similarly to the noise-free case, we also consider the stabilizing effect of a careful choice of A. 
Indeed, in Figure 4 we plot the error behavior of some of the methods considered for the solution 
of SHAW(64). Taking A = 10 for the RAT method, we can overcome the problem of semi- 
convergence keeping at the same time a good level of accuracy contrary to other well performing 
methods such as GMRES and LSQR_B. 




Figure 4: Error behavior for SHAW(64) with noise. RAT method is implemented with A = 10. 



8 Conclusions 

Our experience with the RA and the RAT methods leads us to consider these methods as reliable 
alternatives to the classical iterative solvers for ill-conditioned problems. Since they actually 
are iterative refinement processes, the attainable accuracy is almost never worse that the other 
solvers. While this property could be somehow expected, maybe the most important feature of 
these methods is their robustness. Indeed, contrary to other iterative refinement processes such 
as the Riley's algorithm, the methods work pretty well for a large window of values of A. Hence, 
having a good error estimator or working with applications in which it is possible to monitor the 
result step by step, one may reduce A in order to save computational work; in the opposite case, 
one may increase A slowing down the method but assuring a stable convergence. To this purpose, 
we intend to use, in a forthcoming work, the estimates of the norm of the error described in jl] 
and [7] which are based on an extrapolation procedure of the moments of the matrix of the system 
with respect to the residuals of the iterative method. 
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